Application: Differentiable Programming
Training a neural network means optimizing the parameters which control its behavior, with respect to a loss function. The usually employed optimization algorithms, which are based on gradient descent, require computing the loss function’s gradient. This means that we need to differentiate the neural network. While this could in principle be done by hand, automatic differentiation (AD) allows computing the derivative of a given program without additional programming effort. As AD is not restricted to the spe- cific operations used by typical neural networks, more complex constructs, for example involving control flow, can be employed in machine learning, as long as the program remains differentiable and has trainable parameters. This approach, which generalizes from deep neural networks to a broader class of programs, has been called differentiable programming.